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► Submitting Sequence Data to 
GenBank 

The most important source of new data for 
GenBank® is direct submissions from 
scientists. GenBank depends on its 
contributors to help keep the database as 
comprehensive, current, and accurate as 
possible. NCBI provides timely and 
accurate processing and biological review 
of new entries and updates to existing 
entries, and is ready to assist authors who 
have new data to submit. 

NOTE: The 'Authorm' submission tool and the E- 
mail submission form were phased out on 
December 31, 1998, and submissions made with 
those tools are no longer accepted as of that 
date. Instead, please use the improved 
submission tools, Banklt and Sequin, described 
below. 



► Submit now!! 
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► Receiving an accession number for your 
manuscript 

Most journals now expect that DNA and amino acid sequences 
that appear in articles will be submitted to a sequence database 
before publication. Soon after submission, you will receive an 
accession number from the database which you will be able to 
use in your article to refer to the sequence. Please be aware that 
it is only necessary to submit the sequence to one database, 
whichever one is most convenient, without regard for where the 
sequence may be published. Data exchange between GenBank. 
EMBL and DDBJ occurs daily. Sequence data submitted in 
advance of publication can be kept confidential if requested. 

Below are described various ways of submitting DNA 
sequences to GenBank. Essentially, there are two principal 
ways Banklt and Sequin. Banklt is a Web submission tool and 
recommended for simple submissions. With Banklt you can 
indicate coding regions on an mRNA along with a product and 
gene name. For more control over annotating your entry, 
segmented records, or very long entries, Sequin, a stand-alone 
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submission tool, is suggested. 

GenBank will provide you with an accession number to identity 
your sequence, usually within two working days, if the 
submission is received via electronic mail. This accession 
number serves as confirmation that you have submitted your 
data, and allows the community to retrieve the data upon reading 
the journal article. . 

The accession number should be included in your manuscript, 
preferably in a footnote on the first page of the article, or as 
required by individual journal procedures. 

► Banklt - submitting via the WWW 

NCBI has developed a WWW form, called Banklt, for convenient 
and quick submission of sequence data. 

Banklt allows you to enter sequence information into a form, 
edit as necessary, and add biological annotation (e.g., coding 
regions, mRNA features). Banklt transforms your data into 
GenBank format for your review and when your record is 
completed, it can be submitted directly to GenBank. You have 
the option of adding information by using text boxes to describe 
in your own words the source of the sequence and its biological 
features The GenBank annotation staff reviews the submitted 
textual information, incorporates it into the appropriate structured 
fields and returns the record by e-mail for your review 

Banklt is compatible with Netscape clients for Unix, Macs, and 
PCs. In addition, Internet Explorer for the PC and Mac have 
successfully been used. 

► Sequin - stand-alone software for the Mac, 
PC/Windows, and UNIX 

If you do not have access to the WWW, NCBI introduces a 
stand-alone submission program called Sequin. 

Sequin is an interactive, graphically-oriented program based 
on screen forms and controlled vocabularies that guides you 
through the process of entering your sequence and providing 
biological and bibliographic annotation. Sequin is designed to 
simplify the sequence submission process and to provide 
graphical viewing and editing options. It incorporates robust error 
checking and accommodates very long sequences and complex 
annotations. 

► Special submissions - genomes, batch 
sequences, alignments 

Sequin can be used for the submission of individual or small 
numbers of sequences. However, it was also designed to 
facilitate special types of submissions, and should be used 
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instead of Banklt for the following types of submissions: 
genomes and other very long sequences; multiple sequences 
such as batch submissions and segmented sets; and 
population/phylogenetic/mutation studies. 

When preparing the submission of a genome, you can import 
the complete genome sequence into Sequin as well as a file 
containing the amino acid translations in FASTA format, if 
available Sequin will automatically annotate the coding regions 
intervals based on the translations, and you can use Sequin to 
make further complex annotations. Sequin can also accept 
feature annotations in tab-delineated tables. Since the final 
submission file (\sqn) will be quite large, please send it to the 
GenBank staff via FTP rather than by e-mail. To request a 
temporary FTP directory, please contact 
genomes@ncbi.nlm.nih.gov. 

When preparing a submission that contains multiple 
sequences, you can import a single file containing all the 
sequences in FASTA format, or as alignments in FASTA+GAr, 
PHYLIP or NEXUS format. In addition, for 
Dopulation/phylogenetic/mutation studies, you can annotate one 
sequence and propagate the features onto the other sequences. 
When you complete the submission and select the prepare 
submission' option in the 'File' menu, Sequin will prepare a 
single *.sqn file that contains all the sequences. Send the .sqn 
file by e-mail to: 

gb-sub@ncbi.nlm.nih.gov . 

If you are submitting two or more Sequin files, each of which 
contains multiple sequences, send each *.sqn file in a separate 

e_ Please e refer to the Sequin Quick Guide and documentation for 
additional information, both of which are accessible from the 
Sequin Web page. 

► Sending the Data to GenBank 

When using Banklt, the prepared sequence entries are 
submitted directly to GenBank through the WWW. 

When using Sequin, the output files for direct submission 
should be sent to GenBank by electronic mail to: 

gb-sub@ncbi.nlm.nih.gov 

As an alternative, the submission file can be copied to floppy 
disk and mailed to GenBank Submissions at: 

GenBank Submissions 

National Center for Biotechnology Information 
National Library of Medicine 
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Bldg. 38A, Room 8N-803 
Bethesda, MD 20894 

Please label the disk with your name and file name and 
indicate whether it is a PC or MAC disk. 

► Updates 

NCBI processes update requests as well as new submissions. 
You can provide additional annotation, correct errors or 
omissions, or request the release of your "hold-unt.l-published 
record. Banklt or Sequin may be used for updates, or you can 
request changes as text in the body of an e-mail message. Be 
sure to give the accession number of the sequence to be 
updated along with all update information. Send it to: 

update@ncbi.nlm.nih.gov 

Submitters of a record maintain editorial control of that record. 
Any third party update information will be forwarded to the 
submitters of the record for review. Changes will be made to the 
record only at the submitters' request. If submitters can no 
onger be contacted, GenBank resets the right to edit an en by 
to agree with the information presented in the original publication 
(s) cited in the entry. 

► Submission of ESTs, STSs and GSSs 

Batches of ESTs (expressed sequence tags), STSs (sequence 
tagged sites), and GSSs (genome survey sequences) can be 
submitted via special streamlined procedures. 

► Submission of HTGS Records 

The NCBI has developed a protocol for high throughput genome 
sequencing centers to use when they submit large genomic 
records (usually Cosmids or BACs). Specialized tools, including 
fa2htgs and a "genome center version" of Sequin, have been 
created to help such centers produce these subm.ss.on files in a 
convenient way. The HTG page not only provides < de tailed 
submission instructions to genome centers, but also informs 
GenBank users how to access the HTG sequences. 

► Confidentiality 

Some authors are concerned that the appearance of their data in 
GenBank prior to publication will compromise their work. 
GenBank will, upon request, withhold release of new 
submissions for a specified period of time. However, if a paper 
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citing the sequence or accession number is published prior to 
the specified date, your sequence will be released upon 
publication. 

In order to prevent the delay in the appearance of published 
sequence data, we urge authors to inform us of the appearance 
of the published data. As soon as it is available, please send the 
full publication data-all authors, title, journal, volume, pages and 
date-to the following address: 

update@ncbi.nlm.nih.gov 

► Submission of SNPs and other polymorphism 
data 

Data on genetic variation in humans and other organisms can be 
submitted to the NCBI Database of Single Nucleotide 
Polymorphisms (dbSNP). Entries include single nucleotide 
polymorphisms (SNPs), small-scale insertion/deletions, 
polymorphic repetitive elements, and microsatellite variation. 
dbSNP is a separate resource from the GenBank database, and 
submissions do not receive GenBank accessions as noted 
above. However, dbSNP entries do receive dbSNP identifiers 
and contain links to associated GenBank records. Further 
information about submitting data is accessible from the sidebar 
of the dbSNP home page. 

Disclaimer Privacy statement 

Revised March 22, 2002 
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